AITopics | soundness and completeness

Collaborating Authors

soundness and completeness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Thought of Search: Planning with Language Models Through The Lens of Efficiency

Neural Information Processing SystemsMar-22-2026, 22:18:35 GMT

Among the most important properties of algorithms investigated in computer science are soundness, completeness, and complexity. These properties, however, are rarely analyzed for the vast collection of recently proposed methods for planning with large language models. In this work, we alleviate this gap. We analyse these properties of using LLMs for planning and highlight that recent trends abandon both soundness and completeness for the sake of inefficiency. We propose a significantly more efficient approach that can, at the same time, maintain both soundness and completeness. We exemplify on four representative search problems, comparing to the LLM-based solutions from the literature that attempt to solve these problems. We show that by using LLMs to produce the code for the search components we can solve the entire datasets with 100% accuracy with only a few calls to the LLM. In contrast, the compared approaches require hundreds of thousands of calls and achieve significantly lower accuracy. We argue for a responsible use of compute resources; urging research community to investigate sound and complete LLM-based approaches that uphold efficiency.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

VERINA: Benchmarking Verifiable Code Generation

Ye, Zhe, Yan, Zhengxu, He, Jingxuan, Kasriel, Timothe, Yang, Kaiyu, Song, Dawn

arXiv.org Artificial IntelligenceOct-21-2025

Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation -- jointly generating code, specifications, and proofs of code-specification alignment -- offers a promising path to address this limitation and further unleash LLMs' benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often focus on only individual components rather than providing a holistic evaluation framework of all tasks. In this paper, we introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. Verina consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o4-mini, achieves a 61.4\% code correctness rate, 51.0\% for specification soundness and completeness, and a mere 3.6\% proof success rate (based on one trial per task). We hope Verina will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on https://huggingface.co/datasets/sunblaze-ucb/verina and our evaluation code on https://github.com/sunblaze-ucb/verina.

large language model, machine learning, specification, (19 more...)

arXiv.org Artificial Intelligence

2505.23135

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Thought of Search: Planning with Language Models Through The Lens of Efficiency

Neural Information Processing SystemsMay-27-2025, 21:44:16 GMT

efficiency, language model, soundness and completeness, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)

Add feedback

Automating Thought of Search: A Journey Towards Soundness and Completeness

Cao, Daniel, Katz, Michael, Kokel, Harsha, Srinivas, Kavitha, Sohrabi, Shirin

arXiv.org Artificial IntelligenceAug-21-2024

Planning remains one of the last standing bastions for large language models (LLMs), which now turn their attention to search. Most of the literature uses the language models as world models to define the search space, forgoing soundness for the sake of flexibility. A recent work, Thought of Search (ToS), proposed defining the search space with code, having the language models produce that code. ToS requires a human in the loop, collaboratively producing a sound successor function and goal test. The result, however, is worth the effort: all the tested datasets were solved with 100% accuracy. At the same time LLMs have demonstrated significant progress in code generation and refinement for complex reasoning tasks. In this work, we automate ToS (AutoToS), completely taking the human out of the loop of solving planning problems. AutoToS guides the language model step by step towards the generation of sound and complete search components, through feedback from both generic and domain specific unit tests. We achieve 100% accuracy, with minimal feedback iterations, using LLMs of various sizes on all evaluated domains.

automating thought, journey, soundness and completeness

arXiv.org Artificial Intelligence

2408.11326

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.73)

Add feedback

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Li, Yawei, Zhang, Yang, Kawaguchi, Kenji, Khakzar, Ashkan, Bischl, Bernd, Rezaei, Mina

arXiv.org Artificial IntelligenceAug-17-2023

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.08949

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Features and Abstract Actions for Computing Generalized Plans

Bonet, Blai, Francès, Guillem, Geffner, Hector

arXiv.org Artificial IntelligenceNov-17-2018

Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as mappings of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the instances themselves, which are not necessarily common to other instances, but abstract actions that are defined on a set of common features. The formulation assumes that the features and the abstract actions are given. In this work, we address this limitation by showing how to learn them automatically. The resulting account of generalized planning combines learning and planning in a novel way: a learner, based on a Max SAT formulation, yields the features and abstract actions from sampled state transitions, and a FOND planner uses this information, suitably transformed, to produce the general plans. Correctness guarantees are given and experimental results on several domains are reported.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

1811.07231

Country: Europe (0.68)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Semantic Attachments for Domain-Independent Planning Systems

AAAI ConferencesSep-19-2009

Solving real-world problems using symbolic planning often requires a simplified formulation of the original problem, since certain subproblems cannot be represented at all or only in a way leading to inefficiency. For example, manipulation planning may appear as a subproblem in a robotic planning context or a packing problem can be part of a logistics task. In this paper we propose an extension of PDDL for specifying semantic attachments. This allows the evaluation of grounded predicates as well as the change of fluents by externally specified functions. Furthermore, we describe a general schema of integrating semantic attachments into a forward-chaining planner and report on our experience of adding this extension to the planners FF and Temporal Fast Downward. Finally, we present some preliminary experiments using semantic attachments.

module, module condition, semantic attachment, (16 more...)

AAAI Conferences

Nineteenth International Conference on Automated Planning and Scheduling

Country: Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback